Use the Climate Data Catalog
Contents
Use the Climate Data Catalog#
Once we generate the catalog in the other notebook, we can use the catalog!
Imports#
import intake
from distributed import Client, LocalCluster
import hvplot.xarray
import matplotlib.pyplot as plt
import holoviews as hv
hv.extension("bokeh")
Spin up a Dask Cluster#
cluster = LocalCluster()
client = Client(cluster)
client
2022-05-12 19:17:34,732 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-ihnwtxn5', purging
2022-05-12 19:17:34,736 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-rpinhfj7', purging
2022-05-12 19:17:34,737 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-k9cqvfin', purging
2022-05-12 19:17:34,738 - distributed.diskutils - INFO - Found stale lock file and directory '/Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-l3mrmt5k', purging
Client
Client-16c33222-d252-11ec-9b4a-acde48001122
| Connection method: Cluster object | Cluster type: distributed.LocalCluster |
| Dashboard: http://127.0.0.1:8787/status |
Cluster Info
LocalCluster
9fc7b286
| Dashboard: http://127.0.0.1:8787/status | Workers: 4 |
| Total threads: 12 | Total memory: 16.00 GiB |
| Status: running | Using processes: True |
Scheduler Info
Scheduler
Scheduler-f80f6d16-1678-4b2d-bc31-05dd3123bed9
| Comm: tcp://127.0.0.1:61930 | Workers: 4 |
| Dashboard: http://127.0.0.1:8787/status | Total threads: 12 |
| Started: Just now | Total memory: 16.00 GiB |
Workers
Worker: 0
| Comm: tcp://127.0.0.1:61946 | Total threads: 3 |
| Dashboard: http://127.0.0.1:61950/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:61936 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-b2pjahsi | |
Worker: 1
| Comm: tcp://127.0.0.1:61948 | Total threads: 3 |
| Dashboard: http://127.0.0.1:61955/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:61935 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-1096onay | |
Worker: 2
| Comm: tcp://127.0.0.1:61947 | Total threads: 3 |
| Dashboard: http://127.0.0.1:61951/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:61934 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-3knsoa3c | |
Worker: 3
| Comm: tcp://127.0.0.1:61945 | Total threads: 3 |
| Dashboard: http://127.0.0.1:61949/status | Memory: 4.00 GiB |
| Nanny: tcp://127.0.0.1:61933 | |
| Local directory: /Users/mgrover/git_repos/cloud-for-climate/notebooks/dask-worker-space/worker-q45o7cpx | |
Access the data#
We have an intake catalog we can read in!
data_catalog = intake.open_catalog('catalogs/test-catalog.yml')
data_catalog["cesm-test-dataset"]
cesm-test-dataset:
args:
storage_options:
fo: merged-data.json
remote_options:
token: anon
remote_protocol: s3
urlpath: reference://
description: CESM Test Dataset
driver: intake_xarray.xzarr.ZarrSource
metadata:
catalog_dir: /Users/mgrover/git_repos/cloud-for-climate/notebooks/catalogs/
Load the Data Using Dask#
ds = data_catalog["cesm-test-dataset"].to_dask()
ds
<xarray.Dataset>
Dimensions: (time: 12, lat: 192, ilev: 71, lev: 70, lon: 288,
nbnd: 2, zlon: 1)
Coordinates:
* ilev (ilev) float64 4.5e-06 7.42e-06 ... 985.1 1e+03
* lat (lat) float64 -90.0 -89.06 -88.12 ... 89.06 90.0
* lev (lev) float64 5.96e-06 9.827e-06 ... 976.3 992.6
* lon (lon) float64 0.0 1.25 2.5 ... 356.2 357.5 358.8
* time (time) object 2035-02-01 00:00:00 ... 2036-01-01...
* zlon (zlon) float64 0.0
Dimensions without coordinates: nbnd
Data variables: (12/36)
P0 (time) float64 dask.array<chunksize=(1,), meta=np.ndarray>
ch4vmr (time) float64 dask.array<chunksize=(12,), meta=np.ndarray>
co2vmr (time) float64 dask.array<chunksize=(12,), meta=np.ndarray>
date (time) float64 dask.array<chunksize=(12,), meta=np.ndarray>
date_written (time) object dask.array<chunksize=(1,), meta=np.ndarray>
datesec (time) float64 dask.array<chunksize=(12,), meta=np.ndarray>
... ...
sol_tsi (time) float64 dask.array<chunksize=(12,), meta=np.ndarray>
time_bnds (time, nbnd) object dask.array<chunksize=(1, 2), meta=np.ndarray>
time_written (time) object dask.array<chunksize=(1,), meta=np.ndarray>
wet_deposition_NHx_as_N (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
wet_deposition_NOy_as_N (time, lat, lon) float32 dask.array<chunksize=(1, 192, 288), meta=np.ndarray>
zlon_bnds (time, zlon, nbnd) float64 dask.array<chunksize=(1, 1, 2), meta=np.ndarray>
Attributes:
Conventions: CF-1.0
case: b.e21.BW.f09_g17.SSP245-TSMLT-GAUSS-LOWER-0.5.001
host:
initial_file: b.e21.BWSSP245cmip6.f09_g17.CMIP6-SSP2-4.5-WACCM.001.c...
logname: geostrat
model_doi_url: https://doi.org/10.5065/D67H1H0V
source: CAM
time_period_freq: month_1
topography_file: /scratch/geostrat/inputdata/atm/cam/topo/fv_0.9x1.25_n...Investigate our Dataset#
Let’s investigate our dataset!
Plot Using Matplotlib#
We can start with a single time step
ds.wet_deposition_NHx_as_N.isel(time=0).plot();
And a single point
ds.wet_deposition_NHx_as_N.sel(lat=41.8781,
lon=-87.6298,
method='nearest').plot()
plt.title('NHx Wet Deposition near Chicago, IL')
Text(0.5, 1.0, 'NHx Wet Deposition near Chicago, IL')
Plot Using hvPlot#
Let’s use an interactive plotting library!
We can start with a single time step
ds.wet_deposition_NHx_as_N.isel(time=0).hvplot(cmap='reds')
And a single point
ds.wet_deposition_NHx_as_N.sel(lat=41.8781,
lon=-87.6298,
method='nearest').hvplot.line(title='NHx Wet Deposition near Chicago, IL')
WARNING:param.CurvePlot02951: Converting cftime.datetime from a non-standard calendar (noleap) to a standard calendar for plotting. This may lead to subtle errors in formatting dates, for accurate tick formatting switch to the matplotlib backend.